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Introduction 

The  alphaviruses  are  a  widespread  group  of  human  pathogens  that  are 
present  virtually  everywhere  in  the  world  (Griffin,  1986;  Monath,  1988  #1774; 
Peters,  1990  #1551).  They  are  mosquito-borne  viruses  and  have  the  capacity  to 
replicate  in  the  mosquito  vector  as  well  as  in  human  host  or  in  various  species  of 
birds  and  mammals.  Old  World  alphaviruses  are,  in  general,  capable  of  causing 
fever,  rash  and  arthralgia  in  man  that  may  be  very  painful  and  disabling  for 
extended  periods  of  time.  In  the  case  of  the  Ockelbo  strain  of  Sindbis  virus  and  of 
Ross  River  virus,  thic  arthralgia  manifests  as  a  polyarthritis  that  may  in  some 
cases  last  for  months  or  years.  Many  of  the  New  World  alphaviruses  can  cause 
fatal  encephalitis  in  man.  Our  program  attempts  to  understand  the  molecular 
basis  of  alphaviruses  immunogenicity  and  determine  the  relationships  of 
alphaviruses  and  strains  of  alphaviruses  to  one  another. 

In  our  last  report  we  reported  the  localization  of  a  site  in  alphavirus 
glycoprotein  E2  that  binds  neutralizing  antibodies.  The  knowledge  of 
immunogenic  domains  is  important  in  developing  vaccines.  Neutralizing 
antibodies  are  thought  to  be  particularly  important  in  protecting  a  vaccinee  from 
viral  infection.  We  developed  a  novel  approach  in  which  Xgtll  expression 
libraries  were  constructed  that  expressed  parts  of  the  Sindbis  genome,  and  these 
were  screened  with  neutralizing  monoclonal  antibodies.  Many  neutralizing 
antibodies  react  with  discontinuous  epitopes  and  thus  will  not  react  with  a 
chimeric  protein  expressed  in  a  Xgtll  library.  However,  we  did  succeed  in 
identifying  one  antibody  which  bound  to  specific  clones  within  the  Xgtll  library 
(Wang  and  Strauss,  1991).  Thus  we  were  able  to  demonstrate  directly  that  this 
neutralizing  monoclonal  antibody  bound  to  glycoprotein  E2  of  Sindbis  virus 
between  residues  173  and  220.  This  approach  confirmed  and  extended  results  in 
which  variants  of  the  virus  selected  to  be  resistant  to  neutralizing  monoclonal 
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antibodies  were  sequenced  in  order  to  identify  the  regions  within  the  glycoproteins 
of  the  virus  with  which  the  antibodies  react  (Strauss  et  al.,  1991).  We  thus 
identified  the  domain  between  residues  170  and  220  of  glycoprotein  E2  of 
alphaviruses  as  being  particularly  important  for  the  antibody  response  of  a  host. 

We  have  also  reported  on  the  sequence  analysis  of  a  number  of  strains  of 
Sindbis  virus  or  of  viruses  related  to  Sindbis  virus,  in  order  to  understand  the 
relationships  of  these  viruses  to  one  another.  We  found  that  a  strain  of  Sindbis 
virus  from  Northern  Europe  that  causes  Ockelbo  disease  in  Sweden,  Pogosta 
disease  in  Finland,  or  Karelian  fever  in  Russia,  a  disease  characterized  by  a 
polyarthritis  whose  symptoms  can  persist  for  months  or  years,  are  very  closely 
related  to  pathogenic  strains  of  Sindbis  virus  isolated  from  South  Africa.  We 
concluded  that  a  South  African  strain  of  Sindbis  was  introduced  into  Northern 
Europe,  probably  in  the  1960s,  where  it  continues  to  cause  epidemics  of  a 
significant  human  disease  (Shirako  et  al.,  1991).  We  have  also  reported  on 
sequences  of  a  number  of  other  Sindbis-like  viruses  in  order  to  determine  the 
relationships  of  these  viruses  to  one  another.  In  this  report,  we  present  sequence 
data  for  a  Sindbis-like  virus  isolated  from  New  Zealand,  Whataroa  virus,  and  a 
virus  from  South  America,  Aura  virus,  which  has  been  isolated  from  Brazil  and 
from  Argentina.  We  have  been  particularly  interested  in  Aura  virus  because  it 
might  represent  the  parent  of  an  emergent  virus,  Western  equine  encephalitis 
virus. 

Methods  Used 

Virus  Strains.  Whataroa  virus  and  Aura  virus  were  obtained  from  Dr.  J. 
M.  Dalrymple  of  USAMRIID.  Viruses  were  grown  and  purified  as  previously 
described  (Shirako  et  al.,  1991). 


Page  7 


cDNA  Clones.  cDNA  clones  were  made  in  one  of  two  ways.  The  first 
method  used  standard  procedures  in  which  first  strand  cDNA  was  made  using 
oligo(dT)  as  primer  and  second  strand  synthesis  was  by  the  method  of  Gubler  and 
Hoffman  (Sambrook  et  al.,  1989);  Gubler,  1983  #1546.  These  cloning  methods,  as 
well  as  the  methods  of  DNA  sequencing  and  RNA  sequencing,  have  been 
described  in  numerous  publications  from  our  laboratory  over  the  years  (Hahn  et 
al.,  1985;  Rice  et  al.,  1985;  Rice  and  Strauss,  1981;  Shirako  et  al.,  1991;  Strauss  et 
al.,  1984). 

In  a  second  approach,  we  developed  methods  suitable  for  high  throughput 
automated  DNA  sequencing,  in  order  to  speed  up  the  acquisition  of  sequence  data. 
Whataroa  virus  was  chosen  as  a  test  virus.  First  strand  cDNA  synthesis  used 
random  priming  and  second  strand  cDNA  was  synthesized  by  the  method  of 
Gubler  and  Hoffman  (Gubler  and  Hoffman,  1983).  After  blunt  ending  the  double- 
stranded  cDNA,  the  internal  EcoRI  sites  were  methylated  and  the  DNA  was 
electrophoresed  in  an  agarose  gel.  EcoRI  linkers  were  attached  to  the  2-4  kb 
fraction  and  the  DNA  cloned  in  the  EcoRI  site  of  a  suitable  vector.  One  hundred 
clones  that  resulted  from  this  cloning  were  characterized  by  restriction  analysis 
and  many  of  them  were  sequenced  using  an  Applied  Biosystems  automated  DNA 
sequencer. 


Sequence  Analysis  of  Whataroa  Virus. 

In  our  report  of  April  24th  of  this  year,  we  reported  the  sequence  of  nsP3 
and  of  nsP4  of  Whataroa  virus.  Most  of  the  sequence  of  Whataroa  virus  RNA,  11.7 
kb,  has  now  been  obtained.  This  sequence  is  being  assembled  to  give  the  complete 
sequence  of  this  virus  RNA.  The  sequences  of  two  stretches  of  the  nonstructural 
protein  coding  region  of  the  genome  are  shown  in  Figs.  1  and  2  as  an  example  of 
this  assembly  process.  Fig.  1  shows  the  sequence  of  about  1000  nucleotides 
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Figure  l.Translated  nucleotide  sequence  from  the  5’terminal  region  of  the  genomic 
RNA  of  Whataroa  virus,  using  the  single  letter  amino  acid  code.  The  open  reading 
frame  begins  with  the  ATG  codon  (nt  31-33).  The  exact  5’  terminus  of  the  RNA  has 
not  been  determined. 
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1  FINRKLYHIAVH6PAKNTEE  £0 

1  TT  CATTAACAGGAAATTGTACCACATTGCAGTTCATGGTCCCGCGAAGAATACTGAGGAA  6  0 

till! 

£1  EQYKAMRAEAADTEYVFDVD  40 

61  GAGCAGTATAAAGCTATGAGAGCAGAAGCGGCGGACACCGAATATGTCTTCGATGTCGAC  1£0 

I  I  I  I  1 

41  KKKCVKREEASGLVLVGELT  60 

121  AAGAAGAAGTGCGTTAAGAGAGAAGAAGCATCGGGTCTTGTGTTAGTAGGCGAACTTACC  180 

I  I  I  I  I 

61  NPPYHEMALEGLKTRPAVPY  80 

181  AACCCGCCATACCATGAAATGGCGCTGGAAGGGCTGAAGACCCGTCCTGCAGTACCTTAT  240 

I  I  I  I  I 

81  KVETIGVIGTPGSGKSAI  IK  100 

241  AAAGTTGAAACAATCGGAGTCATCGGCACACCGGGATCCGGAAAATCCGCAATCATTAAA  300 

I  I  I  I  I 

101  NIVTTRDLVTSGKKENCREI  120 

301  AACATCGTCACTACCAGGGATCTTGTGACCAGCGGAAAGAAAGAAAACTGCCGGGAAATA  360 

I  I  I  I  I 

121  EADVLKHRKMQI  VSKTVDSV  140 

361  GAAGCTGACGTCCTCAAACACCGAAAAATGCAAATCGTTTCAAAGACGGTCGACTCCGTT  420 

I  I  I  I  I 

141  LLNGCHKSVDILYVDEAYAC  160 

421  TTGCTTAATGGTTGCCACAAGTCAGTCGACATCCTGTATGTCGACGAAGCTTACGCGTGC  480 

I  I  I  I  I 

161  HAGTLLALIAIVRPRNKVVL  180 

481  CACGCTGGCACCCTATTGGCCTTAATCGCCATAGTCCGACCTAGAAATAAAGTGGTCCTA  540 

I  I  I  I  I 

181  CGDPKQCGFFNMMQLKVHFN  200 

541  TGTGGCGACCCAAAACAGTGTGGTTTCTTCAACATGATGCAGCTGAAGGTCCACTTTAAC  600 

I  I  I  I  I 

201  DPERDICTKTFYKYISRRCT  220 

601  GACCCTGAACGCGACATTTGCACGAAGACGTTCTACAAATACATTTCTCGTCGGTGCACG  660 

I  I  I  I  I 

£21  QPVT  AIVSTLHYNGKMRTTN  £40 

661  CAACCGGTGACAGCAATTGTGTCTACACTGCACTATAACGGAAAAATGCGCACCACCAAC  720 

I  I  I  I  I 

241  PCNKNIVIDITGQTKPKPGD  260 

721  CCATGTAACAAGAACATCGTAATCGACATTACCGGACAAACCAAACCAAAACCAGGAGAT  780 

I  I  I  I  I 

£61  I  ILTCFRGWVKQLQIEYPGH  280 

781  ATTATCCTGACGTGTTTCAGGGGGTGGGTCAAGCAGCTGCAGATTGAATACCCAGGACAC  840 

I  I  I  I  I 

281  EVMTAAVSQGLTRKGVFPVR  300 

841  GAAGTTATGACTGCGGCAGTTTCACAAGGATTGACGCGAAAAGGGGTCTTTCCCGTAAGA  900 

I  I  I  I  I 

301  GKVNENPLYAITSEHVNVLL  320 

901  GGAAAAGTCAACGAGAACCCGTTATATGCCATCACTTCTGAGCACGTCAACGTACTGTTG  960 

I  I  I  I  I 

321  TRTEDRIVUKTLQGDPUIKQ  340 

961  ACACGAACCGAAGATCGTATCGTGTGGAAAACGCTACAAGGAGACCCTTGGATAAAGCAG  1 020 

I  I  I  I  I 

341  LTN  I  PKGNFHATVEEUEAEH  360 

1  021  CTCACAAACATTCCAAAAGGCAACTTTCACGCCACCGTCGAAGAATGGGAGGCTGAACAC  1  080 


Figure  2  See  legend  on  next  page. 


361  'KGIMEAITS  pPa^f  ^RSNPFSCK  380 

1081  •  AAG'GGAATAATGGAGGCTAT  CACT  AGCCCGGCCCCCCGCAGCAACCCTTT  CAGCTGTAAG  1140 

I  I  I  I  I 

381  TNVCWAKALEPILSTAGISL  400 

1 141  ACAAACGTGTGCTGGGCGAAGGCACTAGAACCTATACTATCGACCGCTGCCATATCACTA  1200 

I  I  I  I  I 

401  TGCQUADLFPQFEDDKPHSA  420 

1201  ACTGGATGT  CAGTGGGCAGATTTGTTTCCGCAATTTGAAGATGACAAACCACATTCGGCC  126  0 

I  I  I  I  I 

421  IYALDVICVKFFGMDLTSGI  440 

1261  ATATACGCTCTAGACGTCATTTGCGTAAAGTTCTTTGGCATGGATTTAACTAGCGGCATA  1 320 

I  I  I  I  I 

441  FSKPLIPLTYHPAEGDRKTA  460 

1321  TTTT  CAAAACCGTTGATCCCATTGACTTATCACCCCGCCGAAGGGGACCGGAAGACAGCG  1 38  0 

I  I  I  I  I 

461  HWDNSPGGRKYGFDKAVVAE  480 

1 381  CACTGGGACAACAGTCCAGGCCAACGAAAGTACGGGTTTGACAAAGCCGTTGTAGCTGAA  1 440 

I  I  I  I  I 

481  LSRRFPVFCMADKGVQLDLG  500 

1 441  TTGTCCCGCAGATT  CCCAGTATTCTGCATGGCAGACAAAGGAGTGCAACTGGACCTACAG  150  0 

I  I  I  l  I 

501  TGRTRVV7SRFNLVPFNRNL  520 

1501  ACGGGCCGNACGCGCGTAGTCNCGTCACGCTTCAACCTTGTGCCATTTAACAGAAATCTG  1560 

ill!! 

521  PHSLVPEYKTQTPGQLSAFI  540 

1561  CCCCACTCGCTTGTCCCGGAGTATAAAACACAAACTCCAGGTCAGCTAAGCGCCTTTATC  1620 

I  I  I  I  I 

541  RQFKQNTILLVSETPAEHST  560 

1 621  CGCCAGTTTAAACAAAACACCAT  CCTGCTTGTATCTGAAACACCTGCCGAACATTCCACC  1 680 

I  I  I  I  I 

561  KSVEWIAPLGTLGATKCYNL  580 

1 681  AAATCTGTGGAATGGATTGCACCGCTGGGTACGCTTGGAGCCACCAAATGCTATAATTTA  1 740 

I  I  I  I  I 

581  AFGFPPQSRYDLVI  INIGTK  600 

1 741  GCATTCGGCTTTCCGCCTCAGTCGAGGTACGACCTAGTGATCATAAATATCGGTACAAAA  1800 

I  I  I  I  I 

601  FRHHHYQQCEDHAATMKTLS  620 

1801  TTCAGACACCACCACTATCAACAGTGCGAAGACCACGCCGCCACCATGAAGACACTGTCA  1 860 

I  I  I  I  I 

621  RSALNCLNPGGTLVVKAYGY  640 

1861  CGTTCCGCCCTTAATTGCCTGAACCCGGGTGGCACATTGGTGGTAAAAGCATATGGCTAC  1 920 

I  I  I  I  I 

641  ADRNSEDilTALARKFVRVS  660 

1 921  GCGGACAGAAACAGTGAAGACATCATTACAGCCCTGGC ACG AAAGTTCGTCAGGGTGTCC  1 980 

I  I  I  I  I 

661  AARPQCVSSNTEMYFIFRQL  680 

1 981  GCGGCCCGCCCACAGTGCGTCT  CAAGCAATACAG AGATGTACTTCATTTTCAGACAACTG  2040 

1  I  I  I  I 

681  DNSRTRQFTPHHLNCVVSSV  700 

2041  GACAACAGCAGAACACGTCAATTCACACCTCATCACCTCAACTGCGTCGTTTCGTCAGTG  21 00 

I  I  I  I  I 

701  YEGTRDGVGA  710 

2101  TACGAGGGAACAAGAGACGGAGTTGGTGCT  2130 

I  I 

Figure  2  continued.  Translated  nucleotide  sequence  of  Whataroa  virus  in  the 
region  encoding  nonstructural  protein  nsP2.  By  homology  with  Sindbis  virus, 
the  sequence  shown  begins  at  amino  acid  97  of  nsP2  and  continues  to  the 
nsP2/nsP3  cleavage  site. 
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beginning  in  the  5'  nontranslated  region  just  upstream  of  the  start  codon  of  the 
long  open  reading  frame  translated  from  the  viral  genomic  RNA.  The  second 
sequence  of  about  2000  nucloeotides  begins  near  the  beginning  of  the  nsP2  gene 
and  continues  through  to  the  end  of  the  nsP2  region  of  the  virus  genome.  As 
stated,  the  remainder  of  the  sequence  has  been  obtained  and  is  being  assembled. 

Whataroa  virus  can  clearly  be  <~onsidered  to  be  a  strain  of  Sindbis  virus  that 
has  spread  to  New  Zealand.  The  amino  acid  sequence  deduced  from  the 
nucleotide  sequence  in  Fig.  2  is  compared  to  that  of  the  AR339  strain  of  Sindbis 
virus,  isolated  from  Egypt  in  1952,  in  Fig.  3.  These  amino  acid  sequences  are  84% 
identical.  Furthermore,  we  have  previously  shown  that  strains  of  Sindbis  virus 
contain  a  3'  nontranslated  regions  that  is  different  from  all  other  alphaviruses.  It 
contains  three  copies  of  a  sequence  that  is  conserved  among  Sindbis  viruses  that 
are  spaced  by  sequences  that  are  poorly  conseved  (Shirako  et  al.,  1991).  From  our 
sequence  data,  we  found  that  this  characteristic  3'  nontranslated  region  is 
present  in  Whataroa  virus. 

Sequence  of  Aura  Virus. 

The  sequence  of  essentially  all  of  the  Aura  virus  genome  has  also  been 
obtained  and  is  being  assembled.  As  an  example  of  this  assembly  process,  the 
sequence  of  about  5000  nucleotides  of  Aura  RNA  in  the  nonstructural  protein 
coding  region  is  shown  in  Fig.  4.  This  sequence  begins  in  the  5'  nontranslated 
region  and  continues  through  nsPl,  nsP2,  and  part  of  nsP3.  Aura  virus  is  closely 
related  to  Sindbis  virus.  The  amino  acid  sequences  of  Sindbis  virus  and  of  Aura 
virus  are  compared  in  Fig.  5  for  the  region  represented  by  the  Aura  sequence  in 
Fig.  4.  The  two  sequences  are  80%  identical,  illustrating  that  Aura  is  in  fact  a 
Sindbis-like  virus.  We  also  found  that  the  3'  nontranslated  region  of  Aura  RNA  is 
Sindbis-like.  Thus  Aura  virus  represents  the  first  known  example  of  a  true 
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F I NRKLYHI AVHGPAKNTEEEQYKAMRAE AADTEYVFDVDKKKCVKREEA 

.  V . M . VTK  .  .  L  .  E . R  .  .  .  K  .  .  . 

*  *  *  *  * 

SGLVLVGELTNPPYHEMALEGLKTRPAVPYKVETIGVIGTPGSGKSAI IK 

. S . L . 

*  *  *  *  * 

NIVTTRDLVTSGKKENCREIEADVLKHRKMQIVSKTVDSVLLNGCHKSVD 

ST.  .A . RL.G. .  T . M  . A  .  E 

*  *  *  *  * 

ILYVDEAYACHAGTLLALIAI VRPRNKVVLCGDPKQCGFFNMMQLKVHFN 

V . F . A . K . M . 

***** 

DPERDICTKTFYKYISRRCTQPVTAIVSTLHYNGKMRTTNPCNKNIVIDI 

H.  .K . D.  .  .K . K  .  .  ,  E  .  .  . 

***** 
TGQTKPKPGD I ILTCFRGWVKQLQIEYPGHEVMT  AAVSQGLTRKGVFP VR 

.  .A . . D . A . YA.  . 

*  *  *  *  * 

GKVNENPLYAITSEHVNVLLTRTEDR I VWKTLQGDPUIKQLThIPKGNFH 

Q . L . P . Q 

*  *  *  *  * 

ATVEEUEAEHKGIMEAI TSPAPRSNPFSCKTNVCUAKALEP ILSTAGISL 

.  .  I  D . IA. .N. . T. . A . .A _ V. 

***** 
TGCQUADLFPQFEDDKPHSA IYALDVICVKFFGMDLTSGIF5KPLIPLTY 

. SE . A . I . L.  .  .QS . 

***** 
HPAEGDRKT  AHUDNSPGQRKYGFDKAVVAELSRRFP VF  CMADKGVQLDLQ 

.  -  . DSA.PV . T  .  .  .  .  Y  .  H  .  I A . QL  .  G  .  .  T . 

***** 
T GRTRV/V7SRFNLVPFNRNLPHSLVPEYKTQTPGQLSAFIRQFKQNT  ILL 

. ISAQH  ....  V . A . EKQ  .  .  PVKK  .  LN  .  .  .  HHSV  .  V 

***** 


VSETPAEHSTKSVEUIAPLGTLGATKCYNLAFGFPPQSRYDUVI INIGTK 

.  .  .  EKI  .  APR  .RI . I.IA..D.N . A . F . 

*  *  *  *  * 

FRHHHYQGCEDHAATMKTLSRSALNCLNPGGTLWKAYGYADRNSEDI IT 

Y.N..F . L . S . VV. 

*  *  *  *  * 

ALARKFVRVSAARPGCVSSNTEMYF I FRQLDNSRTRQFTPHHLNC VVSSV 

. D . L . I  .  .  . 

*  *  *  *  * 

YEGTRDGVGA 


Figure  3.  Aligned  deduced  amino  acid  sequences  of  the  nonstructural  protein 
regions  of  Whataroa  virus  and  Sindbis  virus,  beginning  with  amino  acid  97  of 
Sindbis  virus  nsP2.  The  upper  sequence  in  each  case  is  Whataroa  virus,  and 
amino  acid  identity  in  the  Sindbis  sequence  is  indicated  with  a  dot. 
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1 

ACT 

AGT 

ACT 

TGT 

ACT 

ACA 

GAA 

TTA 

ACT 

GCC 

GTG 

TGC 

CGC 

CCG 

CTA 

AAC 

TAG 

CCC 

CAA 

TCA 

61 

TCG 

AAA 

ATG 

GAG 

AAA 

CCG 

ACA 

GTG 

CAC 

GTT 

GAC 

GTA 

GAC 

CCC 

CAA 

AGT 

CCG 

TXT 

GTG 

CTA 

met 

glu 

lys 

pro 

thr 

val 

his 

val 

asp 

val 

a3p 

pro 

gin 

3er 

pro 

phe 

val 

leu 

121/19 

CAA 

CTG 

CAG 

AAG 

AGT 

TTC 

CCA 

CAA 

TTC 

GAG 

ATT 

GTG 

GCT 

CAG 

CAG 

GTC 

ACT 

CCG 

AAT 

GAC 

gin 

leu 

gin 

lys 

ser 

phe 

pro 

gin 

phe 

glu 

ile 

val 

ala 

gin 

gin 

val 

thr 

pro 

asn 

asp 

181/39 

CAT 

GCT 

AAT 

GCC 

AGA 

GCT 

TTT 

TCG 

CAT 

CTG 

GCT 

AGT 

AAA 

CTG 

ATC 

GAA 

CAT 

GAG 

ATC 

CCC 

his 

ala 

asn 

ala 

arg 

ala 

phe 

ser 

hi3 

leu 

ala 

ser 

lys 

leu 

ile 

glu 

hi3 

glu 

i  le 

pro 

241/59 

ACC 

TCA 

GTT 

ACG 

ATC 

TTG 

GAC 

ATA 

GGA 

AGC 

GCA 

CCA 

GCT 

CGT 

AGA 

ATG 

TAT 

TCC 

GAG 

CAT 

thr 

ser 

val 

thr 

ile 

leu 

asp 

ile 

gly 

ser 

ala 

pro 

ala 

arg 

arg 

met 

tyr 

ser 

glu 

his 

301/79 

AAG 

TAT 

CAC 

TGT 

GTG 

TGC 

CCC 

ATG 

CGT 

AGT 

CCT 

GAA 

GAC 

CCG 

GAC 

CGT 

CTT 

ATG 

AAT 

TAC 

lys 

tyr 

his 

cys 

val 

cys 

pro 

met 

arg 

ser 

pro 

glu 

asp 

pro 

asp 

arg 

leu 

met 

asn 

tyr 

361/99 

GCA 

TCC 

CGA 

CTC 

GCA 

GAC 

AAA 

GCA 

GGG 

GAA 

ATT 

ACC 

AAC 

AAG 

AGG 

CTG 

CAT 

GAT 

AAA 

CTT 

ala 

ser 

arg 

leu 

ala 

asp 

lys 

ala 

gly 

glu 

ile 

thr 

asn 

lys 

arg 

leu 

hi  3 

asp 

lys 

leu 

421/119 

GCA 

GAC 

CTC 

AAG 

TCG 

GTC 

CTC 

GAG 

TCG 

CCG 

GAT 

GCT 

GAA 

ACT 

GGT 

ACC 

ATT 

TGT 

TTC 

CAC 

ala 

asp 

leu 

lys 

ser 

val 

leu 

glu 

ser 

pro 

asp 

ala 

glu 

thr 

gly 

thr 

ile 

cys 

phe 

his 

481/139 

AAT 

GAC 

GTA 

ATA 

TGC 

CGT 

ACG 

ACA 

GCG 

GAG 

GTA 

TCA 

GTT 

ATG 

CAA 

AAT 

GTG 

TAT 

ATC 

AAT 

asn 

asp 

val 

ile 

cys 

arg 

thr 

thr 

ala 

glu 

val 

ser 

val 

met 

gin 

asn 

val 

tyr 

i  le 

asn 

541/159 

GCA 

CCT 

TCG 

ACC 

ATT 

TAC 

CAT 

CAG 

GCC 

CTA 

AAG 

GGA 

GTC 

AGA 

AAA 

CTG 

TAT 

TGG 

ATC 

GGG 

ala 

pro 

ser 

thr 

ile 

tyr 

his 

gin 

ala 

leu 

lys 

gly 

val 

arg 

lys 

leu 

tyr 

t  rp 

ile 

gly 

601/179 

TTC 

GAT 

ACA 

ACG 

CAG 

TTT 

ATG 

TTC 

TCC 

TCG 

ATG 

GCA 

GGG 

TCG 

TAT 

CCG 

TCC 

TAC 

AAT 

ACT 

phe 

asp 

thr 

thr 

gin 

phe 

met 

phe 

ser 

ser 

met 

ala 

gly 

ser 

tyr 

pro 

ser 

tyr 

a3n 

thr 

661/199 

AAT 

TGG 

GCC 

GAT 

GAA 

AGG 

GTG 

CTG 

GAA 

GCG 

CGT 

AAT 

ATA 

GGC 

CTA 

TGT 

AGC 

ACG 

AAG 

CTG 

asn 

trp 

ala 

asp 

glu 

arg 

val 

leu 

glu 

ala 

arg 

asn 

ile 

gly 

leu 

cy3 

ser 

thr 

lys 

leu 

721/219 

AGA 

GAG 

GGT 

ACG 

ATG 

GGC 

AAA 

CTG 

TCT 

ACC 

TTC 

CGG 

AAA 

AAG 

GCC 

TTG 

AAA 

CCT 

GGA 

ACT 

arg 

glu 

gly 

thr 

met 

giy 

lys 

leu 

ser 

thr 

phe 

arg 

lys 

lys 

ala 

leu 

lys 

pro 

giy 

thr 

781/239 

AAC 

GTG 

TAC 

TTC 

TCT 

GTC 

GGT 

TCG 

ACA 

CTC 

TAC 

CCT 

GAG 

AAT 

AGA 

GCG 

GAC 

CTG 

CAG 

AGT 

asn 

val 

tyr 

phe 

ser 

val 

gly 

ser 

thr 

leu 

tyr 

pro 

glu 

asn 

arg 

ala 

asp 

leu 

gin 

ser 

841/259 

TGG 

CAC 

CTA 

CCA 

TCT 

GTG 

TTC 

CAC 

TTG 

AAA 

GGT 

AAA 

CAA 

TCC 

TTT 

ACG 

TGC 

CGC 

TGT 

GAT 

trp 

his 

leu 

pro 

ser 

val 

phe 

his 

leu 

lys 

gly 

lys 

gin 

ser 

phe 

thr 

cys 

arg 

cys 

asp 

901/279 

ACG 

GCG 

GTT 

AAC 

TGC 

GAA 

GGA 

TAC 

GTA 

GTC 

AAG 

AAG 

ATC 

ACC 

ATC 

AGC 

CCC 

GGG 

ATC 

ACG 

thr 

ala 

val 

asn 

cys 

glu 

gly 

tyr 

val 

val 

lys 

lys 

ile 

thr 

ile 

ser 

pro 

giy 

ile 

thr 

961/299 

GGG 

CGT 

GTC 

AAT 

CGG 

TAC 

ACT 

GTG 

ACT 

AAC 

AAC 

AGC 

GAG 

GGA 

TTC 

TTG 

CTG 

TGT 

AAG 

ATC 

gly 

arg 

val 

asn 

arg 

tyr 

thr 

val 

thr 

asn 

asn 

ser 

glu 

gly 

phe 

leu 

leu 

cys 

lys 

i  le 

1021/319 

ACA 

GAT 

ACG 

GTC 

AAA 

GGG 

GAG 

CGT 

GTA 

TCG 

TTC 

CCT 

GTC 

TGT 

ACG 

TAT 

ATT 

CCA 

CCT 

TCA 

thr 

asp 

thr 

val 

lys 

gly 

glu 

arg 

val 

ser 

phe 

pro 

val 

cys 

thr 

tyr 

ile 

pro 

pro 

ser 

1081/339 

ATC 

TGT 

GAC 

CAA 

ATG 

ACA 

GGT 

ATA 

TTG 

GCC 

ACT 

GAT 

ATC 

CAA 

CCC 

GAA 

GAC 

GCG 

CAA 

AAG 

ile 

cys 

asp 

gin 

met 

thr 

gly 

ile 

leu 

ala 

thr 

asp 

ile 

gin 

pro 

glu 

asp 

ala 

gin 

lys 

Figure  4a.  See  legend  on  last  page  of  this  sequence 
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1141/359 

TTG 

CTG 

GTA 

GGA 

CTG 

AAC 

CAA 

CGC 

ATA 

GTC 

GTG 

AAC 

GGA 

AAA 

ACT 

AAT 

AGA 

AAC 

ACC 

AAC 

leu 

leu 

val 

giy 

leu 

asn 

gin 

arg 

ile 

val 

val 

asn 

giy 

ly3 

thr 

asn 

arg 

asn 

thr 

asn 

1201/379 

ACG 

ATG 

CAG 

AAC 

TAT 

CTC 

CTG 

CCC 

GCG 

GTG 

GCT 

ACA 

GGT 

CTG 

AGT 

AAA 

TGG 

GCC 

AAA 

GAA 

thr 

met 

gin 

asn 

tyr 
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leu 

pro 

ala 

val 

ala 

thr 

giy 

leu 

ser 

lys 

trp 

ala 

lys 

glu 

1261/399 

AGA 

AAG 

GCA 

GAC 

TGC 

AGT 

GAC 

GAG 

AAA 

CCA 

TTG 

AAT 

GTG 

AGA 

GAA 

CGC 

AAA 

CTA 

GCT 

TTC 

arg 

lys 

ala 

asp 

cys 

ser 

asp 

glu 

ly3 

pro 

leu 

asn 

val 

arg 

glu 

arg 

iys 

leu 

ala 

phe 

1321/419 

GGT 

TGC 

CTA 

TGG 

GCT 

TTC 

AAG 

ACC 

AAG 

AAG 

ATC 

CAT 

TCT 

TTT 

TAC 

CGC 

CCG 

CCA 

GGC 

ACG 

giy 

cys 

leu 

trp 

ala 

phe 

lys 

thr 

lys 

lys 

ile 

his 

ser 

phe 

tyr 

arg 

pro 

pro 

giy 

thr 

1381/439 

CAG 

ACT 

ATA 

GTA 

AAA 

GTC 

GCA 

GCG 

GAA 

TTC 

AGT 

GCG 

TTC 

CCT 

ATG 

TCC 

TCG 

GTG 

TGG 

ACT 

gin 

thr 

ile 

val 

lys 

val 

ala 

ala 

glu 

phe 

ser 

ala 

phe 

pro 

met 

ser 

ser 

val 

trp 

thr 

1441/459 

ACG 

TCA 

CTG 

CCA 

ATG 

TCA 

CTG 

AGA 

CAG 

AAA 

GTT 

AAA 

CTG 

CTT 

CTT 

GTA 

AAG 

AAA 

ACC 

AAT 

thr 

ser 

leu 

pro 

met 

ser 

leu 

arg 

gin 

lys 

val 

lys 

leu 

leu 

leu 

val 

lys 

lys 

thr 

asn 

1501/479 

AAA 

CCG 

GTA 

GTC 

ACT 

ATT 

ACT 

GAC 

ACT 

GCG 

GTA 

AAA 

AAC 

GCA 

CAA 

GAG 

GCA 

TAT 

AAC 

GAA 

lys 

pro 

val 

val 

thr 

ile 

thr 

asp 

thr 

ala 

val 

lys 

asn 

ala 

gin 

glu 

ala 

tyr 

asn 

glu 

1561/499 

GCC 

GTC 

GAG 

ACA 

GCA 

GAA 

GCG 

GAG 

GAG 

AAA 

GCG 

AAG 

GCC 

TTA 

CCT 

CCG 

CTG 

AAG 

CCG 

ACG 

ala 

val 

glu 

thr 

ala 

glu 

ala 

glu 

glu 

lys 

ala 

lys 

ala 

leu 

pro 

pro 

leu 

lys 

pro 

thr 

1621/519 

GCA 

CCC 

CCT 

GTA 

GCG 

GAG 

GAC 

GTC 

AAA 

TGC 

GAG 

GTC 

ACC 

GAC 

CTG 

GTA 

GAC 

GAT 

GCG 

GGA 

ala 

pro 

pro 

val 

ala 

glu 

asp 

val 

lys 

cys 

glu 

val 

thr 

asp 

leu 

val 

asp 

asp 

ala 

giy 

1681/539 

GCG 

GCC 

CTG 

GTC 

GAG 

ACG 

CCC 

CGG 

GGA 

AAG 

ATA 

AAA 

ATT 

ATC 

CCA 

CAG 

GAA 

GGG 

GAC 

GTG 

ala 

ala 

leu 

val 

glu 

thr 

pro 

arg 

giy 

lys 

ile 

lys 

ile 

ile 

pro 

gin 

glu 

giy 

asp 

val 

1741/559 

CGT 

ATT 

GGT 

TCC 

TAC 

ACA 

GTC 

ATT 

TCT 

CCA 

GCG 

GCA 

GTC 

CTT 

AGA 

AAT 

CAA 

CAA 

CTG 

GAG 

arg 

ile 

giy 

ser 

tyr 

thr 

val 

ile 

ser 

pro 

ala 

ala 

val 

leu 

arg 

asn 

gin 

gin 

leu 

glu 

1801/579 

CCA 

ATC 

CAC 

GAG 

TTA 

GCA 

GAG 

CAG 

GTG 

AAA 

ATT 

ATC 

ACG 

CAC 

GGT 

GGC 

CGA 

ACA 

GGC 

AGG 

pro 

ile 

his 

glu 

leu 

ala 

glu 

gin 

val 

lys 

ile 

ile 

thr 

his 

giy 

giy 

arg 

thr 

giy 

arg 

1861/599 

TAT 

TCC 

GTC 

GAA 

CCT 

TAC 

GAT 

GCT 

AAG 

GTT 

CTC 

CTG 

CCA 

ACA 

GGA 

TGC 

CCC 

ATG 

TCC 

TGG 

tyr 

ser 

val 

glu 

pro 

tyr 

asp 

ala 

lys 

val 

leu 

leu 

pro 

thr 

giy 

cys 

pro 

met 

ser 

trp 

1921/619 

CAA 

CAT 

TTC 

GCG 

GCC 

TTG 

AGC 

GAA 

AGC 

GCT 

ACG 

TTA 

GTC 

TAC 

AAT 

GAG 

AGA 

GAG 

TTC 

CTG 

gin 

his 

phe 

ala 

ala 

leu 

ser 

glu 

ser 

ala 

thr 

leu 

val 

tyr 

asn 

glu 

arg 

glu 

phe 

leu 

1981/639 

AAC 

CGG 

AAA 

CTC 

CAT 

CAC 

ATC 

GCT 

ACG 

AAG 

GGT 

GCG 

GCA 

AAA 

AAC 

ACT 

GAG 

GAA 

GAA 

CAA 

a3n 

arg 

lys 

leu 

his 

his 

ile 

ala 

thr 

lys 

giy 

ala 

a  la 

lys 

asn 

thr 

glu 

glu 

glu 

gin 

2041/659 

TAC 

AAA 

GTA 

TGC 

AAA 

GCT 

AAA 

GAC 

ACG 

GAT 

CAT 

GAG 

TAC 

GTA 

TAC 

GAC 

GTA 

GAT 

GCC 

AGA 

tyr 

lys 

val 

cys 

lys 

ala 

lys 

asp 

thr 

asp 

his 

glu 

tyr 

val 

tyr 

asp 

val 

asp 

ala 

arg 

2101/679 

AAA 

TGC 

GTA 

AAA 

AGA 

GAG 

CAT 

GCA 

CAA 

GGG 

CTA 

GTA 

CTA 

GTT 

GGG 

GAA 

CTA 

ACT 

AAT 

CCG 

lys 

cys 

val 

lys 

arg 

glu 

his 

ala 

gin 

giy 

leu 

val 

leu 

val 

giy 

glu 

leu 

thr 

asn 

pro 

2161/699 

CCT 

TAC 

CAC 

GAG 

CTG 

GCA 

TAC 

GAA 

GGA 

TTA 

CGT 

ACA 

CGA 

CCC 

GCT 

GCC 

CCT 

TAC 

CAT 

ATC 

pro 

tyr 

his 

glu 

leu 

ala 

tyr 

glu 

giy 

leu 

arg 

thr 

arg 

pro 

ala 

ala 

pro 

tyr 

his 

ile 

Figure  4b.  See  legend  on 


last  page  of  this  sequence 
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2221/719 

GAA 

ACA 

CTG 

GGG 

GTC 

ATT 

GGA 

ACA 

CCG 

GGG 

TCA 

GGT 

AAG 

TCG 

GCC 

ATC 

ATA 

AAA 

TCT 

ACG 

glu 

thr 

leu 

gly 

val 

ile 

gly 

thr 

pro 

gly 

ser 

gly 

lys 

ser 

ala 

ile 

ile 

ly3 

ser 

thr 

2281/739 

GTA 

ACA 

CTA 

AAA 

GAC 

CTC 

GTA 

ACT 

AGC 

GGT 

AAG 

AAA 

GAA 

AAT 

TGC 

AAA 

GAA 

ATA 

GAG 

AAT 

val 

thr 

leu 

ly3 

asp 

leu 

val 

thr 

ser 

gly 

lys 

lys 

glu 

asn 

cys 

ly3 

glu 

ile 

glu 

asn 

2341/759 

GAC 

GTC 

CAG 

AAA 

ATG 

CGG 

GGA 

ATG 

ACT 

ATA 

GCT 

ACG 

AGA 

ACG 

GTA 

GAC 

TCG 

GTA 

CTT 

CTT 

asp 

val 

gin 

lys 

met 

arg 

gly 

met 

thr 

ile 

ala 

thr 

arg 

thr 

val 

aap 

ser 

val 

leu 

leu 

2401/779 

AAT 

GGA 

TGG 

AAG 

AAA 

GCA 

GTA 

GAC 

GTC 

CTA 

TAT 

GTG 

GAT 

GAA 

GCG 

TTT 

GCA 

TGT 

CAT 

GCA 

asn 

gly 

trp 

lys 

lys 

ala 

val 

asp 

val 

leu 

tyr 

val 

asp 

glu 

ala 

phe 

ala 

cys 

his 

ala 

2461/799 

GGC 

ACC 

TTA 

ATG 

GCA 

TTG 

ATT 

GCC 

ATT 

GTC 

AAA 

CCG 

AGA 

CGT 

AAA 

GTA 

GTA 

CTG 

TGC 

GGC 

gly 

thr 

leu 

met 

ala 

leu 

ile 

ala 

ile 

val 

lys 

pro 

arg 

arg 

lys 

val 

val 

leu 

cys 

gly 

2521/819 

GAC 

CCG 

AAG 

CAG 

TGG 

ccc 

TTC 

TTT 

AAT 

TTA 

ATG 

CAA 

CTG 

AAG 

GTA 

AAC 

TTC 

AAC 

AAC 

CCC 

asp 

pro 

lys 

gin 

trp 

pro 

phe 

phe 

asn 

leu 

met 

gin 

leu 

lys 

val 

asn 

phe 

asn 

asn 

pro 

2581/839 

GAG 

CGA 

GAC 

CTG 

TGT 

ACT 

TCC 

ACC 

CAT 

TAT 

AAA 

TAT 

ATC 

TCT 

CGC 

AGG 

TGC 

ACC 

CAA 

CCT 

glu 

arg 

asp 

leu 

cys 

thr 

ser 

thr 

his 

tyr 

lys 

tyr 

ile 

ser 

arg 

arg 

cys 

thr 

gin 

pro 

2641/859 

GTT 

ACA 

GCC 

ATA 

GTG 

TCT 

ACA 

TTA 

CAC 

TAT 

GAC 

GGA 

AAG 

ATG 

AGG 

ACT 

ACG 

AAT 

CCC 

TGC 

val 

thr 

ala 

ile 

val 

ser 

thr 

leu 

his 

tyr 

asp 

gly 

lys 

met 

arg 

thr 

thr 

asn 

pro 

cys 

2701/879 

AAA 

AGG 

GCT 

ATC 

GAA 

ATA 

GAC 

GTA 

AAC 

GGA 

TCG 

ACT 

AAG 

CCC 

AAG 

AAA 

GGA 

GAC 

ATA 

GTG 

lys 

arg 

ala 

ile 

glu 

ile 

asp 

val 

asn 

gly 

ser 

thr 

lys 

pro 

lys 

lys 

gly 

asp 

ile 

val 

2761/899 

TTG 

ACG 

TGT 

TTC 

CGT 

GGG 

TGG 

GTT 

AAG 

CAG 

GGG 

CAA 

ATC 

GAT 

TAC 

CCC 

GGA 

CCC 

GGA 

GGT 

leu 

thr 

cys 

phe 

arg 

gly 

trp 

val 

lys 

gin 

gly 

gin 

ile 

asp 

tyr 

pro 

gly 

pro 

gly 

gly 

2821/919 

CAT 

GAC 

CGT 

GCA 

GCT 

TCT 

CAA 

GGG 

CTA 

ACC 

AGA 

AGG 

GGC 

GTT 

TAT 

GCG 

GTC 

AGA 

CAG 

AAA 

hi3 

asp 

arg 

ala 

ala 

ser 

gin 

gly 

leu 

thr 

arg 

arg 

gly 

val 

tyr 

ala 

val 

arg 

gin 

lys 

2881/939 

GTA 

AAT 

GAA 

AAC 

CCA 

CTA 

TAT 

GCA 

GAG 

AAG 

TCA 

GAA 

CAC 

GTT 

AAC 

GTG 

TTA 

CTT 

ACT 

AGG 

val 

asn 

glu 

asn 

pro 

leu 

tyr 

ala 

glu 

ly3 

ser 

glu 

hi3 

val 

asn 

val 

leu 

leu 

thr 

arg 

2941/959 

ACG 

GAA 

GAT 

CGC 

ATA 

GTG 

TGG 

AAG 

ACA 

CTG 

CAA 

GGG 

GAT 

CCT 

TGG 

ATT 

AAG 

TAC 

CTC 

ACT 

thr 

glu 

asp 

arg 

ile 

val 

trp 

lys 

thr 

leu 

gin 

gly 

asp 

pro 

trp 

ile 

lys 

tyr 

leu 

thr 

3001/979 

AAC 

GTT 

CCA 

AAA 

GGG 

AAC 

TTT 

ACA 

GCC 

ACT 

TTA 

GAA 

GAA 

TGG 

CAG 

GCG 

GAA 

CAC 

GAG 

GAC 

asn 

val 

pro 

lys 

giy 

asn 

phe 

thr 

ala 

thr 

leu 

glu 

glu 

trp 

gin 

ala 

glu 

his 

glu 

asp 

3061/999 

ATT 

ATG 

AAG 

GCC 

ATT 

AAT 

TCT 

ACA 

TCC 

ACA 

GTA 

TCT 

GAC 

CCT 

TTC 

GCC 

AGC 

AAA 

GTG 

AAT 

ile 

met 

lys 

ala 

ile 

asn 

ser 

thr 

ser 

thr 

val 

ser 

asp 

pro 

phe 

ala 

ser 

lys 

val 

asn 

3121/1019 

ACA 

TGC 

TGG 

GCT 

AAA 

GCT 

ATT 

ATA 

CCC 

ATC 

CTA 

AGA 

ACG 

GCA 

GGG 

ATA 

GAA 

CTT 

ACA 

TTC 

thr 

cys 

trp 

ala 

lys 

ala 

ile 

ile 

pro 

ile 

leu 

arg 

thr 

ala 

gly 

ile 

glu 

leu 

thr 

phe 

3181/1039 

GAG 

CAG 

TGG 

GAA 

GAT 

CTA 

TTC 

CCG 

CAA 

TTT 

CGT 

AAT 

GAC 

CAA 

CCT 

TAC 

TCC 

GTG 

ATG 

TAT 

glu 

gin 

trp 

glu 

asp 

leu 

phe 

pro 

gin 

phe 

arg 

asn 

asp 

gin 

pro 

tyr 

ser 

val 

met 

tyr 

3241/1059 

GCC 

CTA 

GAT 

GTG 

ATA 

TGT 

ACC 

AAG 

ATG 

TTC 

GGC 

ATG 

GAT 

CTG 

AGC 

AGT 

GGG 

ATC 

TTC 

TCT 

ala 

leu 

asp 

val 

ile 

cys 

thr 

lys 

ret 

phe 

gly 

met 

asp 

leu 

ser 

ser 

gly 

ile 

phe 

ser 

3301/1079 

CGT 

CCT 

GAG 

ATA 

CCT 

CTA 

ACG 

TTC 

CAT 

CCC 

GCG 

GAC 

GTC 

GGC 

CGA 

GTG 

AGA 

GCT 

CAC 

TGG 

arg 

pro 

glu 

ile 

pro 

leu 

thr 

phe 

his 

pro 

ala 

asp 

val 

gly 

arg 

val 

arg 

ala 

his 

t  rp 

Figure  4c.  See  legend  on  last  page  of  this  sequence 
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3361/1099 

GAT 

AAC 

TCC 

CCA 

GGA 

GGG 

CAG 

AAG 

TTT 

GGG 

TAT 

AAC 

AAG 

GCG 

GTA 

ATC 

CCA 

ACT 

TGC 

AAG 

asp 

asn 

ser 

pro 

gly 

gly 

gin 

lys 

phe 

gly 

tyr 

asn 

lys 

ala 

val 

i  le 

pro 

thr 

cys 

lys 

3421/1119 

AAA 

TAC 

CCA 

GTG 

TAC 

TTA 

AGA 

GCA 

GGA 

AAA 

GGG 

GAC 

CAA 

ATA 

CTC 

CCC 

ATA 

TAT 

GGC 

AGA 

lys 

tyr 

pro 

val 

tyr 

leu 

arg 

ala 

gly 

lys 

gly 

asp 

gin 

ile 

leu 

pro 

ile 

tyr 

gly 

a  rg 

3481/1139 

GTT 

TCA 

GTC 

CCA 

TCG 

GCA 

CGG 

AAC 

AAT 

TTA 

GTT 

CCC 

TTA 

AAC 

AGA 

AAT 

CTA 

CCA 

CAC 

TCG 

val 

ser 

val 

pro 

ser 

ala 

arg 

asn 

asn 

leu 

val 

pro 

leu 

asn 

arg 

asn 

leu 

pro 

his 

ser 

3541/1159 

CTA 

ACT 

GCA 

AGC 

CTG 

CAG 

AAA 

AAA 

GAA 

GCA 

GCT 

CCC 

TTG 

CAC 

AAG 

TTC 

CTT 

AAC 

CAA 

CTA 

leu 

thr 

ala 

ser 

leu 

gin 

ly3 

lys 

glu 

ala 

ala 

pro 

leu 

his 

lys 

phe 

leu 

asn 

gin 

leu 

3601/1179 

CCA 

GGA 

CAC 

AGT 

ATG 

CTG 

CTG 

GTC 

TCT 

AAG 

GAA 

ACA 

TGC 

TAT 

TGC 

GTG 

TCC 

AAG 

CGA 

ATC 

pro 

gly 

his 

ser 

met 

leu 

leu 

val 

ser 

lys 

glu 

thr 

cys 

tyr 

cys 

val 

ser 

lys 

arg 

ile 

3661/1199 

ACA 

TGG 

GTC 

GCT 

CCG 

CTG 

GGA 

GTC 

AGA 

GGA 

GCT 

GAC 

CAC 

AAC 

CAT 

GAC 

CTG 

CAT 

TTC 

GGG 

thr 

trp 

val 

ala 

pro 

leu 

gly 

val 

arg 

gly 

ala 

asp 

his 

asn 

his 

asp 

leu 

his 

phe 

gly 

3721/1219 

TTC 

CCA 

CCA 

CTG 

TCC 

AGA 

TAC 

GAC 

CTT 

GTG 

GTG 

GTT 

AAT 

ATG 

GGA 

CAA 

CCG 

TAC 

AGG 

TTC 

phe 

pro 

pro 

leu 

ser 

arg 

tyr 

asp 

leu 

val 

val 

val 

asn 

met 

gly 

gin 

pro 

tyr 

arg 

phe 

3781/1239 

CAT 

CAC 

TAC 

CAG 

CAG 

TGC 

GAG 

GAG 

CAT 

GCC 

GGC 

CTC 

ATG 

AGG 

ACG 

TTG 

GCC 

CGG 

TCA 

GCA 

his 

his 

tyr 

gin 

gin 

cys 

glu 

glu 

his 

ala 

gly 

leu 

met 

arg 

thr 

leu 

ala 

arg 

ser 

ala 

3841/1259 

CTC 

AAC 

TGC 

CTA 

AAA 

CCA 

GGA 

GGA 

ACA 

TTA 

GCC 

CTG 

AAA 

GCA 

TAT 

GGT 

TTC 

GCC 

GAC 

TCC 

leu 

asn 

cys 

leu 

lys 

pro 

gly 

gly 

thr 

leu 

ala 

leu 

lys 

ala 

tyr 

gly 

phe 

ala 

a3p 

ser 

3901/1279 

AAT 

AGT 

GAG 

GAC 

GTT 

GTT 

CTG 

TCT 

TTA 

GCG 

AGG 

AAA 

TTC 

GTG 

CGG 

GCA 

TCC 

GCA 

GTG 

AGA 

asn 

ser 

glu 

asp 

val 

val 

leu 

ser 

leu 

ala 

arg 

ly3 

phe 

val 

arg 

ala 

ser 

ala 

val 

arg 

3961/1299 

CCA 

TCG 

TGT 

ACA 

CAG 

TTT 

AAC 

ACA 

GAG 

ATG 

TTC 

TTT 

GTA 

TTT 

AGG 

CAG 

CTG 

GAC 

AAC 

GAT 

pro 

ser 

cys 

thr 

gin 

phe 

asn 

thr 

glu 

met 

phe 

phe 

val 

phe 

arg 

gin 

leu 

asp 

asn 

asp 

4021/1319 

CGT 

GAG 

CGC 

CAA 

TTC 

ACT 

CAG 

CAT 

CAC 

TTG 

AAT 

TTA 

GCA 

GTA 

TCC 

AAT 

ATA 

TTC 

GAC 

AAT 

arg 

glu 

arg 

gin 

phe 

thr 

gin 

his 

his 

leu 

asn 

leu 

ala 

val 

ser 

asn 

ile 

phe 

asp 

asn 

4081/1339 

TAT 

AAA 

GAC 

GGA 

TCC 

GGA 

GCA 

GCT 

CCT 

TCT 

TAT 

CGC 

GTT 

AAG 

AGA 

ATG 

AAT 

ATC 

GCA 

GAC 

tyr 

lys 

asp 

gly 

ser 

gly 

ala 

ala 

pro 

ser 

tyr 

arg 

val 

lys 

arg 

met 

asn 

ile 

ala 

asp 

4141/1359 

TGC 

ACA 

GAA 

GAA 

GCA 

GTG 

GTG 

AAC 

GCA 

GCT 

AAC 

GCG 

CGG 

GGA 

AAA 

CCT 

GGG 

GAC 

GGA 

GTA 

cys 

thr 

glu 

glu 

ala 

val 

val 

asn 

ala 

ala 

asn 

ala 

arg 

gly 

lys 

pro 

gly 

asp 

gly 

val 

4201/1379 

TGC 

AGA 

GCT 

ATC 

TTC 

AAA 

AAG 

TGG 

CCG 

AAG 

TCA 

TTT 

GAG 

AAC 

GCT 

ACC 

ACT 

GAA 

GTG 

GAA 

cys 

arg 

ala 

ile 

phe 

lys 

lys 

trp 

pro 

lys 

ser 

phe 

glu 

asn 

ala 

thr 

thr 

glu 

val 

glu 

4261/1399 

ACC 

GCG 

GTC 

ATG 

AAA 

CCA 

TGC 

CAC 

AAC 

AAG 

GTT 

GTT 

ATA 

CAT 

GCA 

GTG 

GGT 

CCT 

GAT 

TTT 

thr 

ala 

val 

met 

lys 

pro 

cys 

his 

asn 

lys 

val 

val 

ile 

his 

ala 

val 

gly 

pro 

asp 

phe 

4321/1419 

AGA 

AAG 

TAC 

ACG 

TTG 

GAG 

GAA 

GCG 

ACG 

AAG 

CTA 

CTG 

CAG 

AAC 

GCA 

TAC 

CAT 

GAT 

GTG 

GCA 

arg 

lys 

tyr 

thr 

leu 

glu 

glu 

ala 

thr 

lys 

leu 

leu 

gin 

asn 

ala 

tyr 

his 

asp 

val 

ala 

4381/1439 

AAG 

ATA 

GTG 

AAC 

GAG 

AAA 

GGC 

ATC 

TCC 

TCG 

GTA 

GCT 

ATA 

CCG 

CTG 

CTC 

TCA 

ACA 

GGT 

ATC 

ly3 

ile 

val 

asn 

glu 

lys 

gly 

ile 

ser 

ser 

val 

ala 

ile 

pro 

leu 

leu 

ser 

thr 

gly 

ile 

4441/1459 

TAT 

GCT 

GCC 

GGA 

GCT 

GAT 

CGC 

CTG 

GAT 

CTC 

TCG 

CTG 

AGA 

TGT 

CTT 

TTC 

ACC 

GCG 

CTG 

GAT 

tyr 

ala 

ala 

gly 

ala 

asp 

arg 

leu 

asp 

leu 

ser 

leu 

arg 

cys 

leu 

phe 

thr 

ala 

leu 

asp 

Figure  4d.  See  legend  on  last  page  of  this  sequence 
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4501/1479 

CGT 

ACG 

GAT 

GCG 

GAT 

GTC 

ACA 

ATA 

TAT 

TGC 

CTA 

GAT 

AAG 

AAG 

TGG 

GAG 

CAA 

CGC 

ATA 

GCA 

arg 

thr 

asp 

ala 

asp 

val 

thr 

ile 

tyr 

cys 

leu 

asp 

ly3 

lys 

trp 

glu 

gin 

arg 

ile 

ala 

4561/1499 

GAT 

GOT 

ATT 

AGG 

ATG 

CGA 

GAA 

CAA 

GTA 

ACT 

GAA 

TTA 

AAA 

GAT 

CCG 

GAC 

ATA 

GAG 

ATA 

GAT 

asp 

ala 

ile 

arg 

met 

arg 

glu 

gin 

val 

thr 

glu 

leu 

ly3 

asp 

pro 

asp 

ile 

glu 

ile 

asp 

4621/1519 

GAA 

GGA 

TTA 

ACC 

CGG 

GTA 

CAC 

CCA 

GAT 

AGC 

TGC 

CTC 

AAG 

GAT 

CAC 

ATA 

GGC 

TAC 

AGT 

ACC 

glu 

giy 

leu 

thr 

arg 

val 

his 

pro 

asp 

ser 

cys 

leu 

lys 

asp 

his 

ile 

gly 

tyr 

ser 

thr 

4681/1539 

CAG 

TAT 

GGG 

AAA 

TTG 

TAC 

TCA 

TAC 

TTT 

GAA 

GGT 

ACT 

AAA 

TTC 

CAC 

CAA 

ACC 

GCA 

AAA 

GAC 

gin 

tyr 

gly 

lys 

leu 

tyr 

ser 

tyr 

phe 

glu 

gly 

thr 

lys 

phe 

hi3 

gin 

thr 

ala 

lys 

asp 

4741/1559 

ATA 

GCC 

GAG 

ATT 

CGT 

GCG 

CTG 

TTT 

CCT 

GAT 

GTA 

CAA 

GCC 

GOT 

AAC 

GAA 

CAA 

ATC 

TGC 

CTG 

ile 

ala 

glu 

ile 

arg 

ala 

leu 

phe 

pro 

asp 

val 

gin 

ala 

ala 

asn 

glu 

gin 

ile 

cys 

leu 

4801/1579 

TAC 

ACT 

TTA 

GGC 

GAA 

CCG 

ATG 

GAG 

TCC 

ATA 

CGC 

GAA 

AAG 

TGC 

CCA 

GTC 

GAA 

GAC 

TCC 

CCG 

tyr 

thr 

leu 

gly 

glu 

pro 

met 

glu 

ser 

ile 

arg 

glu 

lys 

cys 

pro 

val 

glu 

asp 

ser 

pro 

4861/1599 

GCA 

TCA 

GCA 

CCT 

CCT 

AAG 

ACA 

ATA 

CCT 

TGC 

CTA 

TGT 

ATG 

TAT 

GCT 

ATG 

ACA 

GCC 

GAA 

CGT 

ala 

ser 

ala 

pro 

pro 

lys 

thr 

ile 

pro 

cys 

leu 

cys 

met 

tyr 

ala 

met 

thr 

ala 

glu 

arg 

4921/1619 

ATT 

TGC 

CGC 

GTA 

CGC 

AGT 

AAC 

TCC 

GTA 

ACG 

AAC 

ATA 

ACG 

GTG 

TGC 

TCA 

TCC 

TTT 

CCG 

TTA 

ile 

cys 

arg 

val 

arg 

ser 

asn 

ser 

val 

thr 

a3n 

ile 

thr 

val 

cys 

ser 

ser 

phe 

pro 

leu 

4981/1639 

CCC 

AAG 

TAC 

CGA 

ATA 

AAG 

AAC 

GTA 

CAA 

AAG 

ATA 

CAG 

TGC 

ACG 

AAA 

GTG 

pro 

lys 

tyr 

arg 

ile 

lys 

asn 

val 

gin 

lys 

ile 

gin 

cys 

thr 

lys 

val 

Figure  4  Translated  sequence  of  Aura  virus.  This  sequence  starts  near  the  5’terminus  of  the 
genome,  although  the  exact  5’  end  is  not  known.  The  translated  sequence  shown 
encompasses  nsPl,  nsP2,  and  the  N-terminal  (conserved)  region  of  nsP3.  Nucleotides  are 
numbered  from  the  beginning  of  the  sequence;  amino  acids  are  numbered  from  the 
beginning  of  the  open  reading  frame. 
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23  MEKPTVHVDVDPQSPFVLQLQKSFPQFEIVAQQVTPNDHANARAFSHLAS  72 
I  I  I  I  .  I  :  I  !  I  I  I  I  I  I  II  :  I  I  I  I  I  I  I  t  I  I  :  I  f  I  I  I  I  I  II  I  I  I  II  I  i  I  I  II  I 
1  MEKPWNVDVDPQSPFWQLQKSFPQFEWAQQVTPNDHANARAFSHLAS  50 
•  «  «  •  • 

73  KLIEHEIPTSVTILDIGSAPARRMYSEHKYHCVCPMRSPEDPDRLMNYAS  122 
I  I  I  I  I :  I  I  ..  I  I  I  I  I  I  I  I  I  I  I  II : I  I  I  .  I  I  I  I  II  I  I  I  I  I  I  II  I  :  I  .  I  I  I 
51  KLIELEVPTTATILDIGSAPARRMFSEHQYHCVCPMRSPEDPDRMMKYAS  100 

123  RLADKAGEITNKRLHDKLADLKSVLESPDAETGTICFHNDVICRTTAEVS  172 
:  I  I :  I  I  .  .  I  I  I  I . I  I  :  I  :  I  I  :  .  I  I  :  .  I  I  I  I  I  .  .  : I  I  I  I  I  I  .  I  .  II  I 
101  KLAEKACKITNKNLHEKIKDLRTVLDTPDAETPSLCFHNDVTCNMRAEYS  150 

173  VMQNVYINAPSTIYHQALKGVRKLYWIGFDTTQFMFSSMAGSYPSYNTNW  222 

I  I  I : I  I  I  I  I  I  :  I  I  I  I  I  I  :  I  I  I  I . I  I  I  I  I  I  I  I  I  II  I  I  I  .  I  I  I  I  I  I  . Ill  I  I 

151  VMQDVYINAPGT I YHQ AMKGVRTL YW I G FDTTQFMF S AMAG S YP A YNTNW  200 

223  ADERVLEAENIGLCSTKLREGTMGKLSTFRKKALKPGTNVYFSVGSTLYP  272 

II  I : I  I  I  I  I  I  I  I  I  I  I  II  I .  I  I  I  I  I  I  I  I  I  .  I  I  I  I  ..  I  I  I  !  II  II  I  I  I 

201  ADEKVLE ARN I GLC STKLSEGRTGKL S IMRKKELK PG SRVYF SVG STL Y P  250 

273  ENRADLQSWHLPSVFHLKGKQSFTCRCDTAVNCEGYWKKITISPGITGR  322 
I  :  I  I  .  I  I  I  I  I  I  I  I  I  I  II .  I  II  I : I  II  II  I  .  I  .  I  I  I  II  III  I  I  I  I  I  I  I  I  I 
251  EHRASLQSWHL PS VFHLNGKQS YTC RCDTWS C EG YWKK I T I S PG ITG E  300 

323  VNRYTVTNNSEGFLLCKITDTVKGERVSFPVCTYIPPSICDQMTGILATD  372 
.  i  .  I  I : I  I  I  I  I  II  I  I  :  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  :  .  I  I  I  I  I  I  I  I :  I  I  I 
301  TVGYAVTHNSEGFLLCKVTDTVKGERVSFPVCTYIPATICDQMTGIMATD  350 
•  •  •  •  • 

373  IQPEDAQKLLVGLNQRIWNGKTNRNTNTMQNYLLPAVATGLSKWAKERK  422 
I  I  :  I  I  I  I  I  II  I  I  I  I  I  I  I  :  I  I  :  I  I  I  I  I  I  I  I  I  I  II  I  I  :  I  1:11111111 
351  ISPDDAQKLLVGLNQRIVINGRTNRNTNTMQNYLLPIIAQGFSKWAKERK  400 
•  •  *  •  • 

423  ADCSDEKPLNVRERKLAFGCLWAFKTKKIHSFYRPPGTQTIVKVAAEFSA  472 
.1  . :  I  I  I  ..  I  I  I  I  I  I  I  I  I  I  I  :  I  I  I  :  I  I  I  I  I  I  I  I  I  I  I  .  I  I  I  :  I  .  I  I  I 
401  DDLDNEKMLGTRERKLTYGCLWAFRTKKVHSFYRPPGTQTCVKVPASFSA  450 

473  FPMSSVWTTSLPMSLRQKVKLLLVKKTNKPWTITDTAVKNAQEAYNEAV  522 
I  I  I  I  I  I  I  I  II  I  I  I  I  I  I  I  I  :  I  I  I  .  I  .  :  .  .  :  :  :  .  :  .  I  .  :  I  .  .  I  :  :  :  I 
451  FPMSSVWTTSLPMSLRQKLKLALQPKKEEKLLQVSEELVMEAKAAFEDAQ  500 
«  •  ♦  ♦  • 

523  ETAEAEEKAKALPPLKP . TAPPVAEDVKCEVTDLVDDAGAALVETPRGKI  571 
I  .  I  II.  .Mill  :  .  :  .  .  I  .  :  1  I  I  I  .  :  I  .1  III  M  I  III  I  .  : 

501  EEARAEKLREALP PLVADKG I EAAAEWCEVEGLQADI G AALVETPRGHV  550 


Figure  So.  See  legend  on  last  page  of  this  sequence. 
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572  KIIPQEGDVRIGSYTVISPAAVLRNQQLEPIHELAEQVKIITHGGRTGRY  621 
: I  I  I  I . .  I  .11  I  .  I  :  I  I  .  .  I  I  :  I  .  .  I  .  I  I  .  I  I  :  I  I  I  I  II  I  :  II  .  II  I 
551  RIIPQANDRMIGQYIWSPNSVLKNAKLAPAHPLADQVKIITHSGRSGRY  600 

622  SVEPYDAKVLLPTGCPMSWQHFAALSESATLVYNEREFLNRKLHHIATKG  671 
.  I  I  1  I  I  I  I  I  I  :  I  .  I  .  :  :  .  I  .  .  I  I  !  I  I  I  I  I  II  I  i  I  I  I  I  :  II  I  I  .  I  i  I  .1 
601  AVEPYDAKVLMPAGGAVPWPEFLALSESATLVYNEREFVNRKLYHIAMHG  650 
•  •  •  »  • 

672  AAKNTEE EQ YKVC KAKDTDHE YVYDVDARKCVKREHAQGL VLVG ELTN P P  721 
:  I  I  I  I  I  I  i  I  I  II  .  I  I  .  .  :  111:111  :  :  I  I  I  :  I  .  I  I  I  I  I  I  II  I  I  I  I 
651  PAKNTEEEQYKVTKAELAETEYVFDVDKKRCVKKEEASGLVLSGELTNPP  700 
•  •  *  •  • 

722  YHELAYEGLRTRPAAPYHIETLGVIGTPGSGKSAIIKSTVTLKDLVTSGK  771 

I  I  I  I  I .  I  I  I  :  II  I  I . I  1 .  :  I  I  :  I  I  l  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  : I  I  I  I  I  I  I 

701  YHELALEGLKTRPAVPYKVETIGVIGTPGSGKSAIIKSTVTARDLVTSGK  750 

772  KENCKEIENDVQKMRGMTIATRTVDSVLLNGWKKAVDVLYVDEAFACHAG  821 
1111:111.11  : : I  I  I  I  .  .  :  I  I  I  I  I : II  I  .111:1111111111111 
751  KENCREIEADVLRLRGMQITSKTVDSVMLNGCHKAVEVLYVDEAFACHAG  800 
♦  »  «  •  * 

822  TLMALIAIVKPRRKWLCGDPKQWPFFNLMQLKVNFNNPERDLCTSTHYK  871 
.1:111111:11:11111111.1  .111:11111:11:11:1:11.1  II 
801  ALLALIAIVRPRKKWLCGDPMQCGFFNMMQLKVHFNHPEKDICTKTFYK  850 
•  •  •  •  • 

872  YISRRCTQPVTAIVSTLHYDGKMRTTNPCKRAIEIDVNGSTKPKKGDIVL  921 

II  I  I  I  I  I  I  I  I  I  II  I  I  I  I  I  I  I  I  II  :  I  I  I  I  I  I  :.  I  I  I  I  :.  I  .  I  I  II  .  I  I  I  :  I 

851  YISRRCTQPVTAIVSTLHYDGKMKTTNPCKKNIEIDITGATKPKPGDIIL  900 
•  •  •  «  ■ 

922  TCFRGWVKQGQIDYPGPGGHDRAASQGLTRRGVYAVRQKVNENPLYAEKS  971 
I  II  I  I  I  I  I  I  I  I  I  I  I  I  .  :  .  .  I  I  I  I  I  I  II  :  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  .1 
901  TCFRGWVKQLQIDYPGHEVMTAAASQGLTRKGVYAVRQKVNENPLYAITS  950 
•  »  •  •  • 

972  EHVNVLLTRTEDRIVWKTLQGDPWIKYLTNVPKGNFTATLEEWQAEHEDI  1021 
I  I  I  I  I  I  II  II  I  I  I  :  I  I  I  I  I  I  I  I  II  I  I  11:11111  II  :  I  :  I  :  I  I  I  .  :  I 
951  EHVNVLLTRTEDRLVWKTLQGDPWIKQPTNIPKGNFQATIEDWEAEHKGI  1000 

1022  MKAINSTSTVSDPFASKVNTCWAKAIIPILRTAGIELTFEQWEDLFPQFR  1071 
:  INI...  .  :  I  I  .  :  I  .  I  .  II  I  II  :  III  I  I  I  I  II  I  I  .:  I  I  I  I  I 
1001  IAAINSPTPRANPFSCKTNVCWAKALEPILATAGIVLTGCQWSELFPQFA  1050 

1072  NDQPYSVMYALDVICTKMFGMDLSSGIFSRPEIPLTFHPADVGRVRAHWD  1121  * 

:  I  .  I  .  I  .  :  I  I  I  I  I  I  I  .  I  :  I  I  I  I  I  .  I  1  :  I  I  :  .  .  I  I  I  I  :  I  I  I  I  :  I  .  I  I  I  I 
1051  DDKPHSAIYALDVICIKFFGMDLTSGLFSKQSIPLTYHPADSARPVAHWD  1100 

1122  NSPGGQKFGYNKAVIPT . CKKYPVYLRAGKGDQILPIYGRVSVPSARNNL  1170 
I  I  II  .  .  I  :  I  I  :  .  I  :  :  .  :  :  :  :  I  I  :  I  I  I  I  .  I  :  I  I  .  .  I  II  .  :  I  I 

1101  NSPGTRKYGYDHAIAAELSRRFPVFQLAGKGTQLDLQTGRTRVISAQHNL  1150 
■  •  *  •  • 

1171  VPLNRNLPHSLTASLQKKEAAPLHKFLNQLPGHSMLLVSKETCYCVSKRI  1220 

11:111111.1.: _ I  :  :  :  I  :  .  I  I  I  I  I  :  .  11:1:11.1 . Ill 

1151  VPVNRNLPHALVPEYKEKQPGPVKKFLNQFKHHSVLWSEEKIEAPRKRI  1200 


Figure  5lpcontinued,  see  legend  on  next  page. 
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1221 

1201 

1271 

1251 

1321 

1301 

1371 

1351 

1421 

1401 

1471 

1451 

1521 

1501 

1571 

1551 

1621 

1601 

1671 

1651 


TWVAPLGVRGADHNHDLHFGFPPLSRYDLVWNMXQPYRFHHYQQCEEHA 
.1:11:1:  I  II  .  I  .  :  I  I  I  I  I  I  .  t  I  I  I  I  .  :  I  :  .11  I  1  :  I  II  I  :  I  I 

EWIAPIGIAGADKNYNLAFGFPPQARYDLVFINIGTKYRNHHFQQCEDHA 
•  •  •  •  * 
GLMRTLARSALNCLKPGGTLALKAYGFADSNS  EDWLS  LARKFVRAS  AVR 
:  :  :  I  I  .  II  I  I  I  I  I  .  I  I  I  I  I  .  :  I  .  I  I  :  II  .  I  I  I  1  I  I  .  I  I  I  I  I  I  I . I  I  .  I 
ATLKTLSRSALNCLNPGGTLWKSYGYADRNSEDWTALARKFVRVSAAR 

PSCTQFNTEMFFVFRQLDNDRERQFTQHHLNLAVSNIFDNYKDGSGAAPS 
I  .  I  .  I  I  I  I  :::  I  I  I  I  I  I  .  I  .  I  I  I  I  .  I  I  I  I  :  I  I  I  I  I  I  I 

PDCVSSNTEMYLIFRQLDNSRTRQFTPHHLNCVISSVYEGTRDGVGAAPS 


YRVKRMNIADCTEEAWNAANARGKPGDGVCRAIFKKWPKSFENATTEVE 
II . I  I  I  I  I  I  I  I  I  I  I  I  II  I  I :  I :  I  I  :  I  II  I  I  I  :  I  :  I  I  .  I  I  .  :  .  .  I  I  .  : 
YRTKREN I ADCQEEAWNAAN  PLGRPGEG  VC  R  A I YKRW  PTS  FTDS  ATETG 


TAVMKPCHNKWI HAVGPDFRKYTLE  EATKLLQNAYHDVAK I VNEKG I S  S 
II  I  .  .  I  .1  I  I  I  I  I  I  I  I  I  I  I  .  .  .11  I  I  II  I  ill  .  I  I  .  :  I  I  I  .  .  I  .  I 
TARMTVCLGKKVIHAVGPDFRKHPEAEALKLLQNAYHAVADLVNEHNIKS 
•  •  •  •  • 

VAIPLLSTGIYAAGADRLDLSLRCLFTALDRTDADVTIYCLDKKWEQRIA 
I  I  I  I  II  I  I  I  I  I  I  I  I  I  I  I  :  :  I  I  .  I  I  I  I  I  I  I  I  II  I  I  I  I  I  I  I  I  I  I  I  .  :  I  I  . 
VAIPLLSTGIYAAGKDRLEVSLNCLTTALDRTDADVTIYCLDKKWKERID 

DAIRMREQVTELKDPDIEIDEGLTRVHPDSCLKDHIGYSTQYGKLYSYFE 
.  I  :  .  :  :  I  I  I  I  I  I  I  .  I  :  II  I  :  :  I  .  :  :  II  I  I  I  II  :  :  1:11  I  I  I  I  I  I  I  I 
AALQLKESVTELKDEDMEIDDELVWIHPDSCLKGRKGFSTTKGKLYSYFE 
•  *  •  *  * 

GTKFHQTAKDIAEIRALFPDVQAANEQICLYTLGEPMESIREKCPVEDSP 
I  I  I  I  I  I .  I  I  I  :  I  I  I  :.  I  I  I  :  I  .  .  I  I  I  :  I  I  .  II  1  .  I  I  .  I  I  I  II  I  I  :  .  .  I 
GTKFHQAAKDMAE IKVLFPNDQESNEQLC AY I LG ETMEA IREKC PVDHN P 
•  »  •  •  • 

AS APPKT I PCLCMYAMTAER I CRVRS NS VTN I TVC  S S  F  PL PK YR I KNVQK 
.1.1111:111111111:11:  I  :  I  I  I  .  I  .  :  :  I  I  I  I  I  II  I  I  I  I  I  I  I  I 
SSSPPKTLPCLCMYAMTPERVHRLRSNNVKEVTVCSSTPLPKHKIKNVQK 
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:  1  I  I  I  I 
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1320 
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1350 

1420 

1400 

1470 

1450 

1520 
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1570 

1550 

1620 
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Figure  5,  continued.  Alignment  of  the  deduced  amino  add  sequences  of  Aura  virus 
(top  line)  and  Sindbis  virus  Gower  line)  in  the  region  encoding  nsPl,  nsP2, 
and  the  N-terminal  (conserved)  domain  of  nsP3.  Amino  add  identities  are 
indicated  with  solid  vertical  lines;  dots  indicate  functionally  similar  residues. 
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Sindbis-like  virus  in  the  Americas  We  have  previously  shown  that  Western 
equine  encephalitis  virus,  previously  thought  to  be  closely  related  to  Sindbis  virus, 
is  in  fact  a  recombinant  virus  in  which  most  of  the  genome  was  derived  from 
Eastern  equine  encephalitis  virus  and  only  the  surface  glycoproteins  were  derived 
from  a  Sindbis-like  virus  (Hahn  et  al.,  1988).  Furthermore,  Western  equine 
encephalitis  virus  lacks  the  characteristic  Sindbis  3’  nontranslated  region. 

Aura  virus  is  widely  distributed  in  South  America,  having  been  isolated  in 
Brazil  and  in  Northern  Argentina.  Analysis  of  the  data  is  not  yet  complete,  but  it 
is  possible  that  Aura  virus  represents  the  ancestral  Sindbis-like  virus,  and  that  it 
was  transmitted  to  the  Old  World  to  serve  as  the  founder  of  the  Sindbis  viruses  in 
the  Old  World,  as  we  previously  postulated  (Levinson  et  al.,  1990).  Aura  virus 
may  have  served  as  one  of  the  parents  of  Western  equine  encephalitis  virus, 
contributing  its  glycoproteins  to  this  recombinant  virus  (Hahn  et  al.,  1988). 

Conclusions 

The  Sindbis-like  viruses,  which  are  found  throughout  the  Old  World  from 
Northern  Europe  to  Africa,  India,  the  Philippines  a  d  the  Australasian  region 
including  New  Guinea,  are  a  clearly  identifiable  group  of  viruses.  They  share  a 
minimum  of  80%  amino  acid  sequence  identity  in  nsP2  and  possess  a 
characteristic  and  conserved  3'  nontranslated  region.  It  is  of  considerable 
interest  that  viruses  belonging  to  this  group  coexist  in  many  parts  of  the  world 
with  other  alphaviruses  that  are  demonstrably  different  in  their  epidemiology, 
serology,  organization  of  the  3'  nontranslated  region,  and  evolutionary  history, 
even  though  most  of  these  non-Sindbis  alphaviruses  cause  diseases  very  similar 
to  those  caused  by  the  Sindbis-like  viruses. 

We  have  clearly  shown  that  high  throughput  automated  DNA  sequencing 
is  ideally  suited  to  the  rapid  analysis  of  an  RNA  virus  family  such  as  the 


alphavirases.  These  procedures  are  rapid  and  generate  large  amounts  of  useful 
information  very  quickly.  Such  procedures  would  be  very  useful  in  defining  the 
origin  and  spread  of  an  epidemic  virus. 

We  have  shown  that  Aura  virus  is  a  New  World  representative  of  the 
Sindbis  viruses.  Further  analysis  is  required  to  determine  whether  it  is  one  of  the 
parents  of  Western  equine  encephalitis  virus,  but  the  hypothesis  that  Western 
equine  encephalitis  virus  is  an  emergent  virus  that  arose  by  recombination  has 
received  further  support  from  these  studies. 
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